Force-induced unzipping of DNA with long-range correlated sequence 
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We consider force-induced unzipping transition for a heterogeneous DNA model with a long-range 
correlated base-sequence. It is shown that as compared to the uncorrelated situation, long-range 
correlations smear the unzipping phase-transition, change its universality class and lead to non-self- 
averaging: the averaged behavior strongly differs from the typical ones. Several basic scenarios for 
this typical behavior are revealed and explained. The results can be relevant for explaining the 
biological purpose of long-range correlations in DNA. 



Introduction. Structural transformations of DNA un- 
der changing of external conditions are of primary im- 
portance for molecular biology and biophysicsa. They 
take place in transcription of genetic information from 
DNA and in duplication of DNA during cell divisionEl. 
The common physical scenario of both these processes is 
unwinding of the double-stranded structure of DNA un- 
der influence of external forces. Whereas theoretical and 
experimental studies of thermal denaturation (melting) 
of DNA have a long history, force-induced unzipping, has 
been actively investigated only relatively receptlyouoEl; 
for a concise review and more references seeu. The re- 
search in this field is motivated Jpy the new generation of 
micromanipulation experimentsu. For the theoretical un- 
derstanding of the subject botLkomo and heteropolymer 
models of DNA were studiedo'Ela. 

The main purpose of the present letter is to make 
the next step towards real DNAs and to analyze force- 
induced unzipping for a DNA-model, where the corre- 
lation structure of the base-sequece is taken into ac- 
count. Indeed, one of the n main differences between 
DNA and other biopolymersO is that the base-sequence 
of the formex, displays long-range correlations (1/f- noise 
spectrum)0oB; for a review seeOJ. Recall that the base- 
sequence of a DNA molecule consists of purines (A and 
G) and pyrimidines (C and T). They constitute the 
genetic code carried by DNA. Initial studies reported 
long-range correlations for non-coding regions of DNA, 
while more recent results show that— .certain types of 
them can also exist in coding regionstO. Moreover, sys- 
tematic changes were found in the structure of correla- 
tions ckpjending on the evolutionary category of the DNA 
carrierElH. In spite of ubiquity of long-range correlations 
in DNA-structures, their biological reason remains basi- 
cally unexplored. 

We will show below that long-range correlations 
present in the base-sequence of DNA make its behavior 
under the unzipping external force essentially non-self- 
averaging: there are several widely different scenarios of 
behavior which specifically depend on the concrete struc- 
ture of the base-sequence and are not reproduced by the 
averaged behavior. This is in contrast to DNAs with 
short-range correlated base-sequence whose behavior in 
the vicinity of the unzipping transition is perfectly self- 
averaging: almost every molecule behaves (in the ther- 



modynamic limit) similar to the average. 

The model we will work with takes into account the 
most minimal amount of physical ingredients needed to 
describe force- induced unzipping, i) a DNA molecule is 
lying along the x-axis between the points x = a and 
x = L. ii) only inter-strand (hydrogen) bonds of the 
molecule are considered; they are located at points Xi, 
a < Xi < L, i = 1, M. Any bond can be in one of two 
states: bound or broken. We choose the overall energy 
scale in such a way that the latter case contributes to the 
Hamiltonian a binding energy </>(xj), whereas the former 
case brings nothing. Different types of bonds do have 
different binding energies, so <j)(xi) is a random quantity 
with an average (4>): <j)(xi) = ((f)) + rj(xi). Hi) a force is 
acting on the left end x = a of the molecule pulling apart 
the two strands. Thus, if a bond x% is broken, all the 
bonds Xj with j < i are broken as well. Each broken bond 
brings additionally to the Hamiltonian a term —T, where 
T is proportional to the acting force, iv) summarizing all 
of these, one comes to the Hamiltonian H(x) = —Tx + 
J2i=i <l>{ x i) = {{4>) - F)x + J2i=i Vi x i)} where x is the 
number of broken bonds. In the thermodynamical limit, 
where L and M are large, one applies the continiuum 
description with x being a real number, a < x < L, and 
ends up with the following Hamiltonian and partition 
function: 

H{x) = f(x-a)+ f dsn(s),Z=f dxe~^ x \{l) 

J a J a 

where / = 4> — J- and (3 = 1 /T is the inverse temper- 
ature (fcs = 1). It remains to specify the properties of 
the noise r\. Strictly speaking, it can take values cor- 
responding to inter-strand bonds AT and GC. However, 
within the adopted description we assume it is a gaus- 
sian stationary process with an autocorelation function 
K[t - t') = {v(t)v(t')) to be specified later on. The 
model given by (|l|) and by K (t) oc S(t) (white noise) is 
well-known, and was used to describe interfaces, random 
walks in a—disordered media, and aspects of population 
dynamicsE3. It was recently applied for the unzipping 
transition in DNAcl. 

Reduction to a stochastic differential equation. In 
Eq. (0) one fixes L, and views a as a parameter varying 
from the highest possible value L, where Z = 0, to the 
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lowest possible value which we define to be a = 0. The 
quantity t = —a will thus monotonicaly increase and can 
be interpreted as a time- variable. Differentiating Z in 
(Pj) over a and changing the variable as t = —a, one gets: 



dZ_ 

dt 



= l-(3fZ-(3rj(t)Z, 



-L < t < 



(2) 



where we used rj(t) = r}(—t), as follows from the gaus- 
sian stationary property of the noise. This is a Langevin 
equation with a multiplicative noise. From Q) one can 
obtain a stochastic equation for F = —T\nZ: 



dF 
dt 



+ V'(F) = v (t), V(F) = TV F - fF. 



(3) 



The order parameter of the problem is the number of 
broken bonds. Along with its average and variance it is 
defined for t = as 



x = d f F, x = d f (F), Ax 2 = x 2 ~ x 2 = -Td f x. (4) 

Exponentially correlated noise. As the first exam- 
ple we shall consider Ornstein-Uhlenbeck (OU) noise 
K(t) = (D/t) e~l*l/ T , where D is the intensivity, and 
r is the correlation time. Although for a finite r this 
noise is short-range correlated, we believe that it cor- 
rectly catches the basic trends of the more general situ- 
ation when changing r from to some large value. Note 
that the white-noise situation is recovered for r — * 0. 

To handle (||) one differentiates it over t and uses the 
generating equation for OU process rr) = —r\ + \AD£(t), 
where is a white gaussian noisei. (£(£)£(£')) = 2d(t — 
t'). Introducing s = t/^/r one getstil: 



d 2 F 
d^~ 



7(F) ~s=~ V {F) 



.1/4 



(5) 



where 7(F) = 1/Vr + VfF'^F). Recall that Eq. (§) 
has the same form as a Langevin equation for a particle 
with unit mass in the potential V(F) and subjected to a 
white noise and a independent friction with a coeffcient 
7(F). V(F) is confining only fet / > 0: U(F) — > 00 
for F — * ±00. As well-knownE3, for sufficiently long 
times one can neglect the inertial term d 2 F/ds 2 , pro- 
vided that at least one of the following conditions are 
satisfied: (i) the dependence on F in 7(F) is weak; (ii) 
7(F) is sufficiently large. If V"(F) is of order one, then 
the second condition is satisfied both for large and small 
to. If V"(F) is small then the first condition is sat- 
isfied. After neglection of the inertial term in (||), the 
remainder is an ordinary white-nois&,Langevin equation, 
and, by means of standard methodscJ, can be transferred 
to a Fokker-Planck equation for the distribution function 
P(F, s) — (S(F — F[s])), where F[s] is a particular, noise- 
dependent solution of (||). 



dP d V'(F) 



P(F) 



D d 1 d P(F) 



(6) 



ds dF 7(F) * v " ; dF 7(F) OF 7(F) 

For large times (lengths), i.e. for L ^ 1 and t oc s — > 0, 
any solution of (0) tends to the stationary distribution 



(see e.g.0 for a general proof) obtained from 
putting d s P = 0: 

F st (F) =AA 7 (^)ex P 



by 



, (7) 



where Af is the normalization factor. mThc white-noise, 
t -> 0, limit of F st (F) was obtained inHE3. 

The critical domain of the model coresponds to / — > 
+0, where the average energy cost for breaking a hy- 
drogen bond tends to zero. Our aim is to compare in 
this domain the behavior of x for a finite r with that 
of r = (white-noise) as to determine the effect of the 
noise-correlation. Recall fromllj that for r = a sim- 
ple formula exists: x = T 2, 0'(/z), where \x = Tf/D and 
ip'(jj) = d 2 ^nT(^)]/dfi 2 . Thus for / cx fx -> 0, x be- 
comes largea: x = D / f 2 . One can explain this by noting 
that for / — > the potential V(F) in (||) ceases to be 
confining, and the particle escapes to infinity. Note that 
here the random quantity x is concentrated around its 
average: Ax 2 /x 2 oc / — > 0. Thus, in the present context 
a given DNA molecule with a typical base-sequence does 
not have individuality: its behavior coincides with the 
averaged one. 

P s t (F) and x can be expressed via Kummer functions 
and then easily studied numerically. Fig. (|l|) shows that 
although the behavior of x for very small / does not de- 
pend much on r, such a dependence does exist for mod- 
erately small values of /: finite t's smear the small-/ 
singularity and thus increase the stability of the DNA 
molecule, since larger external forces T are demanded to 
achieve the same amount of broken bonds. 




FIG. 1: The order parameter x versus / for Ornstein- 
Uhlenbeck noise with D = 10, T = 1. From right to left: 
t = 0, r = 10, t = 100. 

Long-range correlations will be modelled via a station- 
ary gaussian noise with an autocorrelation functional: 



Kit) = ( V (t)n(Q)) = a\t\ 



(8) 



where < a < 1 is the exponent characterizing the long- 
range correlation, and where a is the intensivity. Al- 
though the real npke.distributions in DNA can be much 
more complicatcdlijiij, (J§|) is certainly the minimal model 
of noise which allows to study long-range correlations. 

To start with, lest us consider a case with a = 0, which 
does not have a direct physical interest and does not allow 
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the strict thermodynamical limit, but, as seen later, it is 
still able to provide a relevant insight. The noise is now 
completely frozen: n(s) in (|l|) does not depend on s, and 
due to this the problem is easily solved from (Fy): 



1 



(9) 



where (...) is taken over the zero-average gaussian ran- 
dom quantity r\ whose dispersion is a. If f3L is large, 
g(rj) behaves as the step-function, g(rj) ~ 6(— r\ — /): for 
any single realization of the noise there is a sharp phase 
transition with a jump at the realization-dependent point 
/ = —rj (non-self-averaging). In contrast, due to the in- 
tegration over rj in @, the behavior of x is smooth, and 
there remains only a crossover between small x/L for a 
large / and x = L/2 for / = 0: the sharp transition 
disappears. 

We return to Eq. (||) with the noise ij(t) characterized 
by (||) . Our aim is to obtain a Fokker-Planck equation for 
the probability density P(Z,t) = (S(Z- Z[t])), where 
Z[t] is a noise-dependent solution of (|2|). Differentiating 
P(Z,t) over t and using (0), one gets: 



dP 
~dt 



d_ 

dZ 



[(l-0fZ)P] 



P-^Z( v (t)S(Z 



Z[t])). 
(10) 



To handle the last term, one uses the factpthat the noise 
is gaussian and applies Novikov's theoremlij, to obtain 



( V (t)S(Z Z[t] ) ) = J ^ ds Kit - 8)(-^6(Z - Z[t})) 

9 -■■ ' (i.) 



= ~Jz J_ L ds K ( t ~ s ) ( 5 ( z ~ 



8rj{s) 



where 8 /Srj(s) is the variational derivative, the equation 
for which is obtained from (||) . Solving this equation and 
using 



Z[s] = exp 



t \Z[u] 



Z[t], (12) 



also obtained from (0), one finally gets 



'Mt)6(Z-Z[t])) = jLz £ dsK(t-s) 



S( Z - Z[t] ) exp 



du 



(13) 



Eqs. (|lfj| , [l3|) are exact, but since now approximations 
have to be applied to get a closed equation for P(Z,t). 
Note from (|| y) the following relation valid in the sta- 
tionary state: (l/Z) — (3f. For / — * +0 this relation can 
be satisfied only if the corresponding stationary distribu- 
tion tends to become non-normalizable due to its large-Z 
behavior. Thus, we can search for this distribution as- 
suming that the characteristic values of Z are large. For 



this one takes the thermodynamical limit L ^> 1, t — ► 0, 
makes partial integration in the RHS of ([l3]), uses (||), 
and gets for <p(Z, s) = J* du/Z[rj, u] (/3/ < 1): 



<P(Z,s) 



Z[r), s] 







duur](u) 
Z[r), u] 



du u 
o Z 2 [ri,u] 



(14) 



Now the last term can be neglected due to the above 
large-Z property. Assuming additionally that the mag- 
nitude of the noise rj(t) is small, one can estimate the sec- 
ond term in the RHS as being at least of order 0(1/ Z 2 ) 
and neglect it as well. For the first term in the RHS of 
(|l4|) one uses ( p^ ) to express Z(s) by Z(0) which due to 
the delta- function in ( |13| ) can be substituted by Z. The 
noise in the resulting equation for <p is again neglected, 
and then d> is determined from: 



Zcj>(Z,s) = se-^+^ s . 
Thus the stationary distribution reads 



Pst(Z) = 



ZV{Z) 



exp 



uV(u) 



u 



(15) 



(16) 



where Af is the normalization (the lower limit of integra- 
tion is not specified, since it can be absorbed to Af), and 
where 



V{Z) 



/ dsK(s)e 4 
Jo 



:z,s 



(17) 



To study the critical behavior of x, one needs two 
asymptotic regimes found from ( p"5| , [l7]): D(Z) oc 

Z l-a e f3f(l- a )Z for pf Z < 1) wmle jg constant 

for (3 fZ ^> 1. Substituting these into (M) and selecting 
the most divergent terms, one gets: 



_ Ax 2 a {13 f) c 



(18) 



where a = cr^ 1_a -'^ 2 ~ Q ^. Due to the above weak-noise as- 
sumption, ( |l8| ) represents the leading term of the small-a 
expansion. It is seen that in contrast to the white-noise 
situation the behavior of x is smeared, and that x is 
stronglw-Jion-self- averaging quantity: k> 1 for (3f <C 1 
(recalHil3 that in the white-noise case: x ~ f~ 2 and thus 
K ~ / _1 — > for / — * 0). Both these results are con- 
trasting to qualitative predictions made irJ± x ~ / _2//q , 
k ~ j-i+2/q o, which means that the small-/ singu- 
larity is stronger and x is even more self-averaging than 
in the white-noise case. We think that this discrepancy 
is due to inapplicability of the reasonings made inEI for 
finite temperatures. 

Typical scenarios of unzipping. The above results on 
non-self-averaging indicate that x(f) is not directly rele- 
vant for experiments which are carried out on single DNA 
molecules: one should study different realizations of the 
noise and identify typical, i.e. frequently met, scenarios 
of behavior. Results of extensive numerical investigation 
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of this problem will be reported elsewhereE-1 Here we dis- 
cuss some representative examples. By means of direct 
numerical enumeration of L = 10 4 discrete base-pairs we 
studied the behavior of the (unaveraged) order parameter 
x as a function of /. The long-range correlated gaussian 
discrete-time stochastic process pas generated following 
to optimized recipes proposed ir£!3. As compared to (||), 
the noise was regularized at short distances due to obvi- 
ous numerical reasons. We focus on the thermodynamical 
domain where / is not very small, and thus comparison 
with the theory is possible. In the (regularized) white 
noise case the simulations are in perfect agreement with 
the theory: x is self-averaged and x oc f~ 2 is reproduced. 
In contrast to that a strong non-self-averaging is present 
for the long-range correlated noise. Moreover, we found 
several radically different scenarios of the typical behav- 
ior. Two extremal ones among them are presented in 
Figs. ||, H The first one is present in nearly 12% of all 
realizations and is demonstrated by Fig. ||. It is char- 
acterized by very smooth, non-critical behavior of x(f) 
for / > 0. Fig. g presents a strictly different situation: 
x(f) increase by several jumps followed by very flat re- 
gions. ir(0) is either equal to its maximal possible value 
L or close to it. This phase-transition scenario is met 
in nearly 45% of all realizations. Other typical realiza- 
tions are intermediate between these two extremes. Our 
discussion of the frozen noise made after (Q) allows to 
explain this jump-plateau structure. A sizeable portion 
of long-range correlated noise realizations can be qualita- 
tively visualized as several pieces of the frozen noise with 
different 77 put next to each other. Now recall from (||) 
that every sufficiently long piece of that type has a single 
first order phase transition with a jump proportional to 
its length. 

In conclusion, we have shown that long-range correla- 
tions in the base-sequence of a model DNA drastically 
influence its unzipping under external force: i) the be- 
havior of the average order parameter in the critical 
regime is smeared; ii) the situation is essentially non-self- 
averaging: there are several scenarios of typical unzip- 



ping which do not coincide with the averaged behavior; 
Hi) long-range correlations increase the adaptability of 
the molecule, since in some typical scenarios it becomes 
more stable with respect to the force, while in others the 
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FIG. 2: x(f) for three realizations of the noise within one 
class of typicality. T = a = 1, L = 10 4 and a = 0.5. 

unzipping phase transition is amplified. What scenario 
will be selected depends on the detailed structure of the 
base-sequence. Some of the above tendencies, e.g. the 
smearing, are seen already for a short-range correlated 
base-sequence. We hope that these results will contribute 
into understanding of the role and the purpose of long- 
range correlations in DNA. 
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FIG. 3: x(f) for three realizations of the noise within another 
class of typicality. T = o~ = 1, L = 10 4 and a = 0.5. 
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